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Abstract 

Introduction: Over the last decade several breast cancer risk alleles have been identified which has led to an 
increased interest in individualised risk prediction for clinical purposes. 

Methods: We investigate the performance of an up-to-date 18 breast cancer risk single-nucleotide polymorphisms 
(SNPs), together with mammographic percentage density (PD), body mass index (BMI) and clinical risk factors in 
predicting absolute risk of breast cancer, empirically, in a well characterised Swedish case-control study of 
postmenopausal women. We examined the efficiency of various prediction models at a population level for 
individualised screening by extending a recently proposed analytical approach for estimating number of cases 
captured. 

Results: The performance of a risk prediction model based on an initial set of seven breast cancer risk SNPs is 
improved by additionally including eleven more recently established breast cancer risk SNPs {P = 4.69 x 10" 4 ). 
Adding mammographic PD, BMI and all 18 SNPs to a Swedish Gail model improved the discriminatory accuracy 
(the AUC statistic) from 55% to 62%. The net reclassification improvement was used to assess improvement in 
classification of women into low, intermediate, and high categories of 5-year risk (P = 8.93 x 10" 9 ). For scenarios we 
considered, we estimated that an individualised screening strategy based on risk models incorporating clinical risk 
factors, mammographic density and SNPs, captures 10% more cases than a screening strategy using the same 
resources, based on age alone. Estimates of numbers of cases captured by screening stratified by age provide 
insight into how individualised screening programs might appear in practice. 

Conclusions: Taken together, genetic risk factors and mammographic density offer moderate improvements to 
clinical risk factor models for predicting breast cancer. 



Introduction 

Breast cancer screening aims to detect the disease early 
in women and thereby reduce mortality from breast 
cancer. It may not be cost-effective to screen all women 
equally often, but rather to allocate resources dispropor- 
tionately across women at different risks of developing 
breast cancer. To identify high- and low-risk groups, a 
model for estimating a woman's individual risk is 
needed. One of the earliest and most widely used risk 
models for sporadic breast cancer is the Gail model [1]. 
The model uses the risk factors of current age, age at 
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menarche, age at first live birth, number of previous 
breast biopsies and first-degree relatives with breast can- 
cer and converts relative risk to absolute risk through 
use of baseline breast cancer incidence and mortality 
from other causes. Several studies have assessed the 
contribution of adding a measure of mammographic 
density to breast cancer risk prediction models [2-4] 
because mammographic density is one of the strongest 
risk factors for breast cancer with a high population 
attributable risk [5]. 

Over the past decade, several common, low pene- 
trance risk alleles for breast cancer have been identified 
by genome-wide association studies (GWAS), which has 
led to a recent increased interest in individualised risk 
prediction for clinical purposes [6,7]. The potential 



© 2012 Darabi et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons 
BiolVlGCl C6ntTcll Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in 
any medium, provided the original work is properly cited. 



Darabi et al. Breast Cancer Research 2012, 14:R25 
http://breast-cancer-research.eom/content/1 4/1 /R25 



Page 2 of 1 1 



impact of adding genetic information to the Gail model 
has been investigated by several researchers [2,8-10]. 
Gail [2] added seven breast cancer risk-associated sin- 
gle-nucleotide polymorphisms (SNPs) to the standard 
Gail model and the discriminatory accuracy improved 
from an area under the receiver operating curve (AUC) 
of 60% to an AUC of 63%, which was however, less than 
the improvement found from adding mammographic 
density to the Gail model. A further 11 independent 
SNP associations have been recently validated in large 
GWAS and candidate gene studies, but their importance 
for risk prediction has not yet been thoroughly investi- 
gated [11-20]. 

Mealiffe et al. [8] studied prediction models based on 
the same seven SNPs as Gail, using data from the 
Women's Health Initiative clinical trial and a wider 
range of statistical methods. The authors studied 
changes in risk strata and provided evidence in favour 
of including genetic information in models for the pre- 
diction of breast cancer. Pharoah et al. [10] have further 
suggested that polygenic risk profiling may already pro- 
vide sufficient information to justify targeting breast 
cancer screening to those women at highest risk. Based 
on a simple analytical strategy, Pashayan et al. [21] 
recently investigated the implications for individualised 
screening in England, using the 18 currently established 
breast cancer risk SNPs. The authors compared the effi- 
ciency of an individualised screening approach based on 
a polygenic profile, with the efficiency of a standard 
approach to screening, based on age alone. 

We investigate the risk prediction performance of the 
currently established 18 breast cancer risk SNPs, empiri- 
cally, in a well-characterised case-control study of breast 
cancer in Swedish women, with data available on mam- 
mographic density, BMI and Gail model variables. We 
evaluate performance of various prediction models by 
receiver operator characteristic curve analysis and by 
assessing reclassification of subjects into risk categories. 
We also evaluate the efficiency of individualised screen- 
ing by extending the analytical strategy of Pashayan et 
al. [21] to incorporate non-genetic risk factors and to 
compare performance of screening programs based on 
equal resources with different risk-prediction models. 
Presentation of results stratified by age provides insight 
into how individualised screening programs might 
appear in practice. 

Materials and methods 

Data 

The individuals/subjects included in the current study 
are drawn from a population-based case-control study 
of postmenopausal breast cancer in women born in 
Sweden aged 50 to 74 years at the time of enrolment, 



which was between 1 October 1993 and 31 March 1995. 
Controls were randomly selected from the Swedish reg- 
ister of the total population and were frequency 
matched to the expected age distribution of the cases. 
Details on data collection and subjects have been 
described previously [22]. From the original case-control 
study, consisting of 3,345 cases and 3,454 controls, 
breast density measurements were available for 1,780 
cases and 1,701 controls. In all, 1,569 breast cancer 
cases and 1,730 healthy controls, from the original case- 
control study, were included in a genetic study. Among 
these women breast density measurements were avail- 
able for 1,022 cases and 868 controls. We carried out 
our analysis on three subsets: women with complete 
data on Gail, percentage density (PD) and body mass 
index (BMI) variables; women with complete data on 
Gail and SNP variables; and women with complete data 
on Gail, PD, BMI and SNP variables. 

The process of collecting mammographic density in 
the cases and controls included in this study has been 
described elsewhere [23]. In short, medio-lateral oblique 
views were used. For controls, the side was chosen ran- 
domly, whereas for cases the side contralateral to the 
tumour was used. The density resolution was set at 12- 
bit spatial resolution. Cumulus [24], a computer-assisted 
thresholding technique, was used to assess density on 
digitised film mammograms. For each image, a (single) 
trained observer set the appropriate gray-scale threshold 
levels defining the edge of the breast and distinguishing 
dense from non-dense tissue. The software calculated 
the total number of pixels within the entire region of 
interest and within the region identified as dense. The 
PD was then calculated from these values (dense area/ 
total breast area). The images were measured together 
with approximately the same amount of images for 
healthy, control women and the reader was blinded to 
case-control status. A random 10% of the images were 
included as replicates to assess the intra-observer relia- 
bility, which was high with a Spearman rank correlation 
coefficient of 0.92. 

Genotyping was performed at the National University 
of Singapore. Approval of the study was given by the 
Institutional Review Boards in Sweden and the National 
University of Singapore. 

Statistical analysis 

Gail et al. [1] presented a method to estimate the prob- 
ability that a woman, with a particular risk profile, in 
terms of age and other known risk factors will develop 
breast cancer during a specific time interval. The 
method can be used to combine case-control data with 
national registry data. Absolute risk is the probability 
that a subject who is free of the disease of interest at 
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age a will be diagnosed with that disease in a subse- 
quent age interval [a,a + S\, and can be written as: 

P (a, 8, r (t)) = ^ hi (t) r (t) exp{- /' h 2 (ii) r (ii) 4u} |^£fc 

where S 2 (t) = exp{- f*h 2 (u) du} is the probability of 
surviving competing risks up to age £ In this equation 
the term S 2 (t)/S 2 (a) corresponds to the conditional 
probability of surviving other causes from age a to t, 
and the exponential term corresponds to surviving with- 
out breast cancer from age a to age t At age £, there is 
an instantaneous probability hi(t)r(t)dt of developing 
breast cancer. The baseline hazard, hi(t), is estimated by 
multiplying age-specific breast cancer incidence rates, 
^i(0> by a conversion factor equal to one minus the 
population attributable risk. The age-specific hazard of 
dying from other causes other than breast cancer is 
represented by h 2 {t) and is assumed to be the same for 
all individuals. The population attributable risk is a 
function of the relative risk model r(t), and can be 
determined according to an approach described by 
Bruzzi et al [25]. In the current article, similarly to as 
in Gail et al [1], we work under the simplifying assump- 
tions that hi, h 2 and r are constant within five-year 
intervals. We estimated the age-specific breast cancer 
incidence rates and hazard of dying from other causes 
from the Swedish Cancer and Cause of Death registries, 
respectively [see Table Al in Additional file 1], and trea- 
ted these values as known, without error. 

The Gail relative risk model [1] incorporates informa- 
tion on the risk factors age at menarche, age at first live 
birth, number of previous breast biopsies and first- 
degree relatives. We did have information on age at 
menarche and age at first live birth, but used family his- 
tory (binary) and benign breast disease (binary), respec- 
tively, as proxies for number of first-degree relatives and 
the number of previous breast biopsies. In our risk-pre- 
diction models, effect estimates for Gail risk factors [1], 
PD, BMI and the genetic markers were retrieved from 
literature, except for the two Gail proxy variables. We 
estimated the effect sizes of the proxy variables using 
our own data by fitting a logistic model with both main 
effects included in a model which included an offset 
term of combined effect from age at menarche and age 
of first live birth, based on published effect estimates. 
We assumed a multiplicative penetrance model for the 
breast cancer-associated SNPs. In order to provide rela- 
tive odds of 1.0 or more for disease-associated alleles, 
where necessary the genotype scores were recoded such 
that the low-risk homozygote represented the baseline 
[2]. For SNPs with effect estimates from multiple 
sources [11-20], we used the inverse variance method 



([26]; pp.375) to obtain a weighted average of effect esti- 
mates from the separate studies. 

Mammographic density has been consistently shown 
to be strongly associated with breast cancer and has pre- 
viously been considered in breast cancer risk-prediction 
models [3-5]. Due to the strong negative correlation 
between body size and mammographic density, the 
effect of density on breast cancer risk is underestimated 
if body size is not adjusted for. We therefore included 
BMI, together with PD in our risk prediction models. 
We used effect sizes obtained externally from a large 
sample of postmenopausal women, from [27], as esti- 
mates of risk (odds ratios) of breast cancer according to 
percent mammographic density (six categories), adjusted 
for BMI and as estimates of risk of breast cancer 
according to BMI (five categories), adjusted for density 
(Table 1). 

We used the Gail approach to estimate the 5-year and 
10-year absolute risk of breast cancer based on age and 
various combinations of genetic and non-genetic risk 
factors. We evaluated various models for breast cancer 
risk based on subsets of women with data on (i) Gail, 
PD and BMI variables, (ii) Gail and SNP variables and 
(iii) Gail, PD, BMI and SNP variables. 

We used the Hosmer-Lemeshow test to assess calibra- 
tion of the prediction models based on comparing 
observed and expected outcomes within deciles of esti- 
mated risk. As in Mealiffe et al [8], we first fitted a 
logistic regression model with a coefficient of one for 
the logit of the absolute risk to estimate a location para- 
meter to account for the case-control design. We also 
evaluated Brier scores [see Additional file 2]. To assess 
discrimination we performed receiver operating charac- 
teristic curve analysis, calculating the AUC statistic, 
along with DeLong's non-parametric interval for AUC, 
and assessed departure from a model with no diagnostic 
capacity using the Mann-Whithney U test. We used the 
non-parametric approach of DeLong et al [28] to test 
for differences in AUC. 

To assess the ability of a new test to reclassify subjects 
accurately into higher or lower risk categories, we evalu- 
ated the two statistics suggested by Pencina et al [29] 
for assessing improvement in model performance 
accomplished by adding new explanatory variables, the 
net reclassification improvement (NRI) and the inte- 
grated discrimination improvement (IDI). We also 
examined the predictiveness curve [30]. 

For the English population Pashayan et al [21] have 
evaluated the efficiency of individualised screening stra- 
tegies for breast cancer based on age and polygenic risk 
profiles. They evaluated the number of cases potentially 
detectable, along with the number of women eligible for 
screening (in the population of women aged 35 to 79 
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Table 1 Effect sizes for the 18 genomic loci, percentage mammographic density, body mass index and clinical risk 
factors, used for risk prediction. 



dbSNP No 


Chromosome 


OR a 


Reference First author (Year) 


OR (95%CI) b 


P c 




rs1 1249433 


1 


1.12 


Turnbul 


(2010) 


Thomas (2009) 

1 1 1 W 1 1 ICO \Z-\J\J J j 


1.12 (1.00 to 1.25) 


4.3 x 


10" 2 


rs 1045485 


2 


1.14 


Turnbul 


(2010) 


Cox (2007) 


1 .08 (0.90 to 1 .28) 


4.1 x 


10- 1 


rs 13387042 


2 


1.15 


Ti i rn hi il 

1 UN 1 kyU 1 


(2010) 


Thomas (2009) Starv (2007) 


1.21 (1.08 to 1.34) 


6.0 x 


10- 4 


rs4973768 


3 


1.11 


Turnbul 


(2010) 


Ahmed (2009) 


1.04 (0.94 to 1.16) 


4.2 x 


10- 2 


rs1 0941 679 


5 


1.19 


Turnbul 


(2010) 


Stacv (2007) 


1.19 (1.06 to 1.34) 


4.0 x 


10- 3 


rs889312 


5 


1.14 


Turnbul 


(2010) 


Easton (2007) 


1.14 (1.01 to 1.28) 


3.5 x 


10- 2 


rs2046210 


6 


1.27 


Turnbul 


(2010) 


Zena (2009) 


1.14 (1.01 to 1.27) 


2.7 x 


10- 2 


rs13281615 


8 


1.10 


1 Ul 1 IUUI 


\ZU I U) 


F^ctnn (1C\(Y7\ 
LdbLUII vZUU / ) 


1.19 (1.07 to 1.33) 


1.6 x 


10- 3 


rs101 1970 


9 


1.09 


Turnbul 


(2010) 




1.04 (0.90 to 1.21) 


5.5 x 


10~ 1 


rs2981582 


10 


1.26 


Turnbul 


(2010) 


Easton (2007) 


1.28 (1.15 to 1.43) 


6.0 x 


10- 6 


rs2380205 


10 


1.11 


Turnbul 


(2010) 




1.04 (0.93 to 1.16) 


4.9 x 


10- 1 


rs 10995 190 


10 


1.16 


Turnbul 


(2010) 




1.12 (0.98 to 1.30) 


9.9 x 


10- 2 


rs704010 


10 


1.07 


Turnbul 


(2010) 




1.07 (0.96 to 1.19) 


2.5 x 


10- 1 


rs3817198 


11 


1.07 


Turnbul 


(2010) 


Thomas (2009), Easton (2007) 


1.01 (0.90 to 1.14) 


8.1 x 


10- 1 


rs6 14367 


11 


1.15 


Turnbul 


(2010) 




1.36 (1.18 to 1.58) 


8.3 x 


10- 4 


rs999737 


14 


1.09 


Turnbul 


(2010) 


Thomas (2009) 


1 .09 (0.96 to 1 .25) 


1.9 x 


10- 2 


rs3803662 


16 


1.20 


Turnbul 


(2010) 


Thomas (2009), Easton (2007), Stacy (2007) 


1.27 (1.13 to 1.43) 


1.0 x 


10- 4 


rs6504950 


17 


1.05 


Turnbul 


(2010) 


Ahmed (2009) 


1.11 (0.98 to 1.25) 


1.6 x 


10- 1 



Percentage mammographic density 



0 1.00 Boyd (2006) 

< 10% 1.27 
10-25 2.00 
25-50 2.98 
50-75 3.70 

> 75 5.86 
BMI (body mass Index) 

< 21.79 1.00 Boyd (2006) 
21.79-23.30 1.16 

23.30-25.02 1.13 

25.02-27.64 1.28 

> 27.64 1.67 
Clinical factors 

Age at menarche 1.10 Gail (1 989) 

Age at first live birth 1.24 Gail (1989) 

Benign breast disease 1.65 Estimated from Swedish Case-Control Data 

Family history 2.07 Estimated from Swedish Case-Control Data 



a Published odds ratio (For SNPs with effect estimates from multiple sources, the inverse variance method was used to obtain a weighted average of effect 
estimate from the separate studies) 

b Per allele odds ratio (per copy of the high-risk allele in the Swedish case-control sample). 

c P-values for tests of association based on likelihood-ratio tests, in the current Swedish case-control study. 



years) based on an individualised screening strategy of 
screening women aged 35 to 79 years with a 2.5% 10- 
year risk evaluated as a function of age and polygenic 
profile. Their approach involves inferring points on the 
predictiveness curve in the population (aged 35 to 79 
years) at large. We extended the procedure (see below) 
to evaluate the potential impact of individualised screen- 
ing in Sweden. As mammography screening is offered to 
women aged 40 to 75 years in Sweden, we evaluated the 
performance of a number of individualised screening 



approaches against a baseline (age only) strategy of 
screening all women between aged 40 to 75 years. In 
Sweden, with its screening strategy, the 10-year absolute 
risk of breast cancer reaches 2.5% by age 40 years and is 
thereafter above 2.5% all the way up to age 75 years 
(absolute risk values derived from Table Al [see Addi- 
tional file 1], using (1); data not shown). In addition to a 
polygenic profile, we also incorporated Gail risk factors 
and PD in the calculation of individualised risk scores. 
To calculate individualised risk scores we simulated a 
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population of 100,000 women aged 40 to 75 years, 
according to the age distribution of the Swedish popula- 
tion. For generating non-genetic risk factors for these 
women we sampled from our own controls, with repla- 
cement. As our data consist of postmenopausal women, 
for women aged younger than 50 years we were forced 
to make additional simplifying assumptions. We 
assumed that these women have the same age condi- 
tional risk factor distribution as women aged 50 years. 
We evaluated different screening strategies based on 
estimating the proportion of the population that has an 
individualised risk greater than a given threshold (1.5%, 
2% or 2.5%) and the proportion of cases that are 
expected to occur within the high-risk subgroup. Evalu- 
ating at different thresholds enabled us to find screening 
strategies that use equal resources (% eligible for screen- 
ing) but are based on different risk-prediction models. 
We stratified our results in five-year age intervals to 
shed light on how individualised screening strategies 
might appear in practice. Although the analytical 
approach for calculating the proportion of cases cap- 
tured by screening does not explicitly model the process 
of evolving risk scores for individual women, depen- 
dence of the distributions of the non-genetic risk factors 
on age is incorporated and age stratification provides 
valuable insights. 

All statistical analyses were performed using the free 
statistical software R [31] and R packages ROCR and 
PredictABEL. 

Results 

We examined several models for predicting absolute 
risk. We examined the effects of including the four Gail 
variables, with modified variable definitions, which we 
refer to as the Swe-Gail variables, as well as the effect of 
including PD and BMI. In all, 18 breast cancer suscept- 
ibility loci with common risk alleles have been examined 
in this study (Table 1); referred to as The 18 herein. We 
also selected out the earlier known subset of seven mar- 
kers studied by Gail [2], referred to herein as The7. 

We first examined the classification abilities of models 
with and without PD and BMI, but including Swe-Gail 
risk factors age at menarche, age at first live birth, 
family history, benign breast disease, in 1,739 cases and 
1,672 controls (Table 2). Without PD and BMI we 
observed an AUC of 0.569 (95% confidence interval (CI) 
= 0.550 to 0.588), compared with an AUC of 0.602 (95% 
CI = 0.584 to 0.621) with PD and BMI. The difference 
in AUCs was statistically significant (AAUC = 0.033, P 
= 1.17 x 10" 7 ). Based on a subset of women with com- 
plete data on Gail variables and SNPs, a statistically sig- 
nificant improvement in AUCs was seen when adding 
The7 to the Swe-Gail model. Improvement was further 
enhanced when the recently discovered 11 SNPs were 



added (AAUC = 0.018, P = 4.69 x 10" 4 ). Furthermore, a 
gain in AUCs was observed from including these 11 
SNPs when the baseline model also included PD and 
BMI. We finally selected a subset of women with com- 
plete data on Gail variables, SNPs, PD and BMI and 
compared the performances of the Swe-Gail model and 
a model additionally including PD, BMI and Thel8. The 
latter model, referred to as the full model herein, 
obtained an AUC of 0.619, improving the AUC by 0.067 
(P = 3.24 x 10" 9 ). In this subset, with PD and BMI only 
we observed on AUC of 0.541 (95% CI = 0.515 to 
0.568), with Thel8 only we observed an AUC of 0.589 
(95% CI = 0.563 to 0.614) and with PD, BMI and Thel8 
we observed an AUC of 0.600 (95% CI = 0.575 to 
0.626). 

The values of absolute five-year risk of breast cancer 
for the women included in our study, calculated at time 
of sampling/diagnosis based on the Swe-Gail model and 
the model additionally containing PD, BMI and The 18 
are plotted in Figure 1. Complementing the Gail model 
with PD and Thel8 increases the spread of the pre- 
dicted absolute risks. A marked difference in distribu- 
tions between cases and controls was observed for the 
full model. The means of the absolute five-year risks 
were 3.69% and 2.84% for cases and controls, respec- 
tively. Of the controls and the cases, 47.9% and 64.8%, 
respectively, had a five-year absolute risk higher than 
2.5%. The difference in distributions between cases and 
controls was more subtle for the Swe-Gail model. 

Assuming three risk categories, we used reclassifica- 
tion tables to compare pairs of models in terms of their 
assignment of women to low (0,8!), intermediate (si,s 2 ), 
and elevated risk categories (s 2 ,l), based on five-year 
absolute risk estimates (Table 3). As cut-off values we 
choose Si (= 2.41%) to correspond to the first quartile 
of the estimated risk based on the Swe-Gail model and 
s 2 (= 4.11%) to correspond to the third quartile. The 
NRI value for the comparison of the Swe-Gail model 
with the full model was 0.170 (Z = 5.750, P = 8.93 x 10" 
9 ). In total, 46% of women were reclassified. Reclassifica- 
tion based on the full model was overall in the right 
direction, with an upward shift in risk categories for 
cases and a downward shift for controls. The global IDI 
measure was 0.004 (Z = 5.742, P = 9.33 x 10" 9 ). Using 
the cut-off values suggested by Mealiffe et al [8], i.e. Sx 
= 1.5%,s 2 = 2%, the NRI value for comparing the same 
two models was estimated to be 0.193 (Z = 8.229, P = 
2.22 x 10" 16 ). 

Model calibration was assessed using the Hosmer- 
Lemeshow approach and by calculating Brier scores. All 
models showed lack of fit; however, lack of model fit 
does not necessarily limit classification ability based on 
estimated risks [32]. Results for the Swe-Gail model and 
the full model are displayed in Tables A2 and A3 [see 
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Table 2 Areas under the receiver operating characteristic curves for different combinations of prediction models. 



OLDmodel 



OLDmodel 



NEWmodel 



NEWmodel 



Controls Cases AUC (95%CI) a 



P- value 
AUC b 



AUC (95%CI) a 



P- value 
AUC b 



P- value 
AAUC C 



Swe-Gail 


Swe-Gail, PD, BMI 


1672 


1739 


0.569 (0.550 - 
0.588) 


3.00 x 


10" 12 


0.602 (0.584 - 
0.621) 


3.85 x 


10" 25 


1.17 x 10~ 7 


Swe-Gail 


Swe-Gail, The7 


1527 


1566 


0.548 (0.527 - 
0.568) 


4.57 x 


1(T 6 


0.597 (0.577 - 
0.617) 


9.98 x 


10" 21 


7.44 x 10" 17 


Swe-Gail 


Swe-Gail, The18 


1527 


1566 


0.548 (0.527 - 
0.568) 


4.57 x 


1(T 6 


0.615 (0.595 - 
0.634) 


1.96 x 


1Q -28 


1.54 x 10~ 18 


Swe-Gail, The7 


Swe-Gail, The18 


1527 


1566 


0.597 (0.577 - 
0.61 7) 


9.98 x 


10" 21 


0.615 (0.595 - 
0.634) 


1.96 x 


10" 28 


4.69 x 10~ 4 


Swe-Gail 


Swe-Gail, PD, BMI 


856 


1017 


0.552 (0526 - 
0.578) 


1.09 x 


10- 4 


0.571 (0.545 - 
0.597) 


1.06 x 


1(T 7 


2.23 x 10~ 7 


Swe-Gail 


Swe-Gail, PD, BMI, 
The7 


856 


1017 


0.552 (0526 - 
0.578) 


1.09 x 


10- 4 


0.604 (0.579 - 
0.630) 


6.95 x 


10" 15 


1.19 x 10~ 7 


Swe-Gail 


Swe-Gail, PD, BMI, 
The 18 


856 


1017 


0.552 (0526 - 
0.578) 


1.09 x 


10- 4 


0.619 (0.594 - 
0.644) 


6.16 x 


10" 19 


3.24 x 10~ 9 


Swe-Gail,PD,BMI 


Swe-Gail, PD, BMI, 
The7 


856 


1017 


0.571 (0.545 - 
0.597) 


1.06 x 


10- 7 


0.604 (0.579 - 
0.630) 


6.95 x 


10" 15 


9.50 x 10~ 9 


Swe-Gail, PD, BMI 


Swe-Gail, PD, BMI, 
The18 


856 


1017 


0.571 (0.545 - 
0.597) 


1.06 x 


10- 7 


0.619 (0.594 - 
0.644) 


6.16 x 


10" 19 


1.93 x 10~ 9 


Swe-Gail, PD, BMI, 
The7 


Swe-Gail, PD, BMI, 
The18 


856 


1017 


0.604 (0.579 - 
0.630) 


6.95 x 


10" 15 


0.619 (0.594 - 
0.644) 


6.16 x 


10" 19 


6.18 x 10~ 3 



a AUC and Confidence Interval (CI) evaluated using Delongs non-parametric estimation, 
b Null hypothesis of AUC = 0.5 assessed using Mann-Whitney U test, 
c Null hypothesis of AAUC = 0 assessed using DeLongs Test. 



Additional file 1] and in Figure 2. Both the Brier score 
and the Hosmer-Lemeshow test statistic values indicate 
an improvement in goodness of fit as a result of updat- 
ing the Swe-Gail model with PD, BMI and SNP data. 

One way to assess the predictive power of a model is 
to estimate the proportions of cases that are accounted 
for by given percentages of the population at the highest 
risk [30]. Figure 3 displays these proportions based on 



the risk distribution generated by the Swe-Gail model, 
and by the full model. For the full model the proportion 
of cases explained by the 20% of the population at the 
highest risk was equal to 40.1%, compared with 35.1% 
for the Swe-Gail model. 

For four personalised screening models, we compared 
the percentage of individuals eligible for screening and 
the percentage of cases potentially detectable by 



1 




Swe-Gail 

9.7% 1 .52% 



Swe-Gail, PD,BMI,The18 



ill 



1 -fold of Risk Cutoff 

2- fold of Risk Cutoff 

3- fold of Risk Cutoff 



Its 



J 



■ 1-fold of Risk Cutoff 
• 2-fold of Risk Cutoff 
3-fold of Risk Cutoff 



0.04 0.06 0.08 0.10 

Predicted Absolute Risk - Controls 



0.06 0.08 
Predicted Absolute Risk - Controls 



72.57% (0.025) \ 18.19% 



1 -fold of Risk Cutoff 

2- fold of Risk Cutoff 

3- fold of Risk Cutoff 



0.00 0.02 0.04 0.06 0.08 0.10 0.12 

Predicted Absolute Risk - Cases 




1- fold of Risk Cutoff 

2- fold of Risk Cutoff 

3- fold of Risk Cutoff 



Predicted Absolute Risk - Cases 



Figure 1 Distributions of estimated absolute risk by case-control status using the Swe-Gail model and the full model (with displayed 
proportions of women with five-year absolute risks greater than (multiples of 2.5%). 



Darabi et al. Breast Cancer Research 2012, 14:R25 
http://breast-cancer-research.eom/content/1 4/1 /R25 



Page 7 of 1 1 



Table 3 Reclassification for the Swe-Gail model compared with the full model, based on cut-off values determined by 
first and third quartile of predicted risk by the Gail model. 



Control subjects 




Full model 








Swe-Gail model 




Low risk (< 2.41%) 


Intermediate risk (2.41 %-4.1 1%) 


High risk (> 4.11%) 


Reclassified (%) 


Low risk (< 2.41%) 




170 


20 


10 


15 


Intermediate risk (2.419 


6-4.11%) 


236 


182 


62 


62 


High risk (> 4.11%) 




20 


65 


91 


48 


Cases subjects 




Full model 








Swe-Gail model 




Low risk (< 2.41%) 


Intermediate risk (2.41 %-4.1 1%) 


High risk (> 4.11%) 


Reclassified (%) 


Low risk (< 2.41%) 




155 


53 


17 


31 


Intermediate risk (2.419 


6-4.11%) 


161 


225 


103 


54 


High risk (> 4.11%) 




14 


97 


192 


37 


Total sample 




Full model 








Swe-Gail model 




Low risk (< 2.41%) 


Intermediate risk (2.41%-4.1 1%) 


High risk (> 4.11%) 


Reclassified (%) 


Low risk (< 2.41%) 




325 


72 


27 


24 


Intermediate risk (2.419 


6-4.11%) 


397 


407 


165 


58 


High risk (> 4.11%) 




34 


162 


283 


41 



screening, for eligible cases, against the (current) 
approach of screening all women aged 40 to 75 years at 
three cut-offs of eligibility [see Table A4 of Additional 
file 1]. The full model with eligibility for screening 
defined by an absolute risk cut-off value of 2% has a 
slightly lower level of eligibility than the Swe-Gail model 
with a 2.5% cut-off for eligibility (74% and 76%, respec- 
tively), but has a substantially higher catchment; for the 
full model 90% of the cases are potentially screen 
detectable, while for the latter only 85% are potentially 
detectable. As a consequence of adding SNPs, BMI and 
PD to the risk-prediction model, resources are more 
efficiently re-allocated to women with high-risk profiles. 



With the aim of comparing screening strategies with 
equal resources, we also calculated the number of cases 
captured by screening based on the most efficient age- 
only screening program, and based on individualised 
screening using the full model when confined to includ- 
ing only 76% of women aged 40 to 75 years. The per- 
centage of all cases aged 40 to 75 years covered by the 
age-only based program was 81%, that is 4% less than 
the program based on the Swe-Gail model and this was 
10% less than the program based on the full risk-predic- 
tion model, for which 91% of cases were screened. 
Results are summarised in Table 4. Age stratified per- 
centages of cases eligible for screening together with the 
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Figure 3 Proportion of breast cancer cases explained by the 
proportion of the population at highest risk of the disease, for 
the Swe-Gail model and the full model. 



percentages of cases covered by screening, for the three 
programs with 76% coverage, along with a selection of 
models presented in Table A4, are presented in Table 
A5 [see Additional file 1]. 

Discussion 

In the present study we have investigated the potential 
gain in combining SNP information with clinical infor- 
mation and mammographic percentage density for the 
prediction of absolute risk of developing breast cancer, 
in the Swedish population, utilising the Gail approach 
[1]. We have examined several models for predicting 
absolute risk, in particular examining the importance of 
variables in the Gail model, mammographic percentage 
density, the seven SNPs studied by Gail [2], and an addi- 
tional 11 SNPs that have recently been confirmed to be 
associated with breast cancer risk. We provide evidence 
that the AUC of the risk-prediction model based on the 
initial seven breast cancer risk SNPs is improved by 
additionally including the 11 more recently established 
breast cancer risk SNPs (P = 4.69 x 10" 4 ). We further 



show that including mammographic PD, BMI and the 
18 SNPs, in the baseline Swe-Gail model, is strongly 
associated with positive reclassification (NRI = 0.170, P 
= 8.93 x 10" 9 ). 

The value of the AUC statistic, for assessing discrimi- 
nation based on absolute risks calculated from the Gail 
model, which we observed is low compared with what 
has been observed in some studies carried out in the 
US. Rockhill et al. [33] observed an AUC of 0.58 based 
on the Nurse's Health Study and Gail [2] an AUC of 
0.61 based on white women aged 50 years and over 
from the US National Health Interview Study. We note 
that for the standard Gail model, the standard deviation 
of the log relative risk estimated for our samples is 
lower than the value estimated in Gail [2] (0.34 com- 
pared with 0.36) and that generally an increase in varia- 
bility in risk scores will be associated with an increased 
AUC value [34]. Studies of the original Gail model have 
reported AUC values ranging from 0.54 to 0.74, 
although the values at both ends of this interval have 
been observed in more 'extreme' populations (0.54 in a 
cohort of 70 year old and older US women [35] and 
0.74 in a study of UK women aged 21 to 73 years from 
a UK family history clinic [36]). 

In our study we observed improvements of 2 to 3% in 
AUC values as a result of adding mammographic den- 
sity to risk-prediction models, which is slightly more 
than the 1% improvement observed by Tice et al. [3]. 
The increase is likely to be partially due to the good 
intra-observer reliability of the Cumulus method used 
for measuring percent density, compared with the BI- 
RADS method used by Tice et al [3]; see [37]. Chen et 
al. [4] estimated an increase in excess of 4%, also using 
percent density. Mammograms/measurements of PD 
were available only on slightly fewer than 50% of the 
individuals in their study; statistical modelling was used 
to infer PD in the remaining subjects. 

We examined the usefulness of the 18 markers on a 
population level, with respect to screening. Using pub- 
lished effect estimates for the 18 markers and the clini- 
cal variables we evaluated several approaches to 
individualised screening, against age only-based screen- 
ing, in women aged 40 to 75 years (at different 10-year 



Table 4 Percentage of cases detectable by screening for the screening strategies with 76% eligibility. 



Model 


Cut-off a 


Eligible 13 (%) 


Cases screened 0 (%) 


Mean (Sd) d 


Age-Only 




76 


81 


0.033 (-) 


Swe-Gail 


0.0250 


76 


85 


0.034 (0.014) 


Swe-Gail, PD, BMI, The18 


0.0195 


76 


91 


0.037 (0.026) 



a Absolute risk cut-off defining eligibility for screening. 

b Percentage of individuals eligible for screening according to the risk distribution estimated by the specified model, 
c Percentage of cases potentially detectable by screening in the population undergoing screening, 
d Mean and standard deviation (Sd) of predicted absolute risk values. 
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risk cut-offs for defining eligibility for screening). We 
showed for the Swedish female population that a perso- 
nalised screening approach based on a risk prediction 
model incorporating age, Gail model variables, PD, BMI 
and 18 SNPs captures significantly more breast cancer 
cases than screening approaches using equal resources 
based on age and Gail model variables and on age 
alone. The individualised screening strategies investi- 
gated here correspond very loosely to a strategy where 
all women are screened at baseline (e.g. at age 40 years) 
and at a small number of occasions (e.g. shortly after 
menopause, and at age 65 years), in order to ascertain 
personalised risk and between these occasions women 
are recommended to attend screening at intervals tai- 
lored to their personalised risk. In practice, rather than 
reducing the total number of mammograms, as in 
Pashayan et al [21], and in the simulation study herein, 
individualised screening might in the first line reallocate 
existing resources unequally across women, according to 
their risk. 

It is now recognised that stratification according to 
genetic risk scores may improve the efficiency of screen- 
ing programs [10]. With on-going genotyping efforts by 
among others, the Breast Cancer Association Consor- 
tium [38], it is likely that in the near future the number 
of established breast cancer risk SNPs will increase 
markedly, potentially making a polygenic approach to 
disease prevention a reality [11]. In the future other 
novel risk factors could potentially be incorporated into 
the Gail approach, such as steroid hormone levels, more 
detailed reproductive history and novel measures of 
mammographic density, such as texture features [39]. 

The strengths of the present study are the population- 
based setting with a high participation rate and the 
detailed information on key breast cancer risk factors, 
including mammographic density. To our knowledge 
this is the first study to assess the prediction perfor- 
mance of the currently established 18 breast cancer risk 
SNPs empirically. 

There are limitations to the present study. Two of the 
variables in our prediction models varied slightly in defi- 
nition from those used in the standard Gail model. For 
these variables we were forced to use internal effect esti- 
mates. Any bias in estimating discriminatory accuracy is, 
however, expected to be negligible. Further limitations 
are that the study is focused on postmenopausal women 
and that the family history variable used in our study is 
very crude. More sophisticated approaches have been 
described that more specifically describe the nature of 
the family history [40,41], using for example such vari- 
ables as number and types of relatives affected with 
breast cancer (plus the ages at which they developed 
breast cancer), special risk factors such as BRCA1 and 



BRCA2 gene mutations and family history of cancers at 
other sites. 

The approach used for assessing efficiency of indivi- 
dualised screening programs is simplistic. It assumes 
that women being screened are under constant surveil- 
lance and that cancer is instantaneously detectable with- 
out error. Moreover, our approach was based on further 
simplifying assumptions, for example that effect sizes 
are age independent. Related to this particular condition 
is our assumption that women aged less than 50 years 
have the same risk distribution of those women aged 50 
years. However, in reality the relative risk associated 
with family history is higher at younger ages [42]. One 
way to relax our assumption would be to incorporate 
interaction effects between age and family history. 
Effects of other risk factors (e.g. breast density) may also 
vary with age, but the approach becomes unwieldy/esti- 
mates become unstable if we account for age-dependent 
effects of several risk factors. An approach to examining 
sensitivity of our results to our assumption, which 
addresses the issue more generally, is to investigate what 
happens when we increase/decrease the variance of the 
risk scores in the women aged less than 50 years by a 
fixed factor. When we increased the variance of the log 
relative risks by a fixed factor (10% increase) we 
observed an increase in the percentage of cases screened 
(approximately 1%) along with a very small increase 
(less than 1%) in the percentage of individuals eligible 
for screening, across all three considered prediction 
models, and advantages of the full model, compared 
with sub-models, were still observed (data not shown). 

More refined approaches for evaluating screening stra- 
tegies need to be developed and applied. It is important 
to incorporate breast cancer mortality as well as inci- 
dence and to at least partially reflect that breast cancer 
is a complex disease with a number of subtypes (which 
receive different treatments) and that patient survival 
outlooks vary. Accurately predicting an individuals risk 
of developing and dying from breast cancer remains a 
challenge. Microsimulation may prove a useful tool for 
accounting for the complicated processes of disease pro- 
gression and detection when evaluating the efficiency of 
screening strategies [43]. Using microsimulation, it 
would be possible to assess refined strategies, for exam- 
ple, where screening intervals are defined as functions of 
breast cancer risk and to consider other aspects such as 
possible over-diagnosis and screening sensitivity. 

Conclusions 

Taken together, genetic risk factors and mammographic 
density offer moderate improvements to clinical risk fac- 
tor models for predicting breast cancer. 
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Additional file 1: Supplementary tables, (al) Age specific composite 
(/l^(t)) and competing mortality rates (h 2 (t)), for breast cancer using 
2005 data from the Swedish cancer registry and cause of death registry 
(per 100,000). (a2) Measures of model calibration and discrimination for 
the Swe-Gail model and the full model. (a3) Expected and observed 
counts of case patients for subgroups of predicted risk for Swe-Gail 
model and the full model. (a4) Percentage of individuals eligible for 
screening and the percentage of cases potentially detectable by 
screening in the population undergoing screening, across different 
(personalised) screening strategies based on different cut-off of 10-year 
absolute risk for developing breast cancer. (a5) Percentage of individuals 
eligible for screening and the percentage of cases potentially detectable 
by screening in the population undergoing screening, across different 
screening strategies based on different cut-off of 10-year absolute risk for 
developing breast cancer, stratified by age. 

Additional file 2: Supplementary methods. Full methods 
accompanying this manuscript. 
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